Prioritizing individual genetic variants after kernel machine testing using variable selection.
نویسندگان
چکیده
Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.
منابع مشابه
Kernel methods for large-scale genomic data analysis
Machine learning, particularly kernel methods, has been demonstrated as a promising new tool to tackle the challenges imposed by today's explosive data growth in genomics. They provide a practical and principled approach to learning how a large number of genetic variants are associated with complex phenotypes, to help reveal the complexity in the relationship between the genetic markers and the...
متن کاملApplication of genetic algorithm (GA) to select input variables in support vector machine (SVM) for analyzing the occurrence of roach, Rutilus rutilus, in streams
Support vector machine (SVM) was used to analyze the occurrence of roach in Flemish stream basins (Belgium). Several habitat and physico?chemical variables were used as inputs for the model development. The biotic variable merely consisted of abundance data which was used for predicting presence/absence of roach. Genetic algorithm (GA) was combined with SVM in order to select the most important...
متن کاملRegion-Based Association Analysis of Human Quantitative Traits in Related Individuals
Regional-based association analysis instead of individual testing of each SNP was introduced in genome-wide association studies to increase the power of gene mapping, especially for rare genetic variants. For regional association tests, the kernel machine-based regression approach was recently proposed as a more powerful alternative to collapsing-based methods. However, the vast majority of exi...
متن کاملDevelopment of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug
Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...
متن کاملDevelopment of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug
Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genetic epidemiology
دوره 40 8 شماره
صفحات -
تاریخ انتشار 2016